TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring

نویسندگان

  • Aroon Nataraj
  • Matthew J. Sottile
  • Alan Morris
  • Allen D. Malony
  • Sameer Shende
چکیده

Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems TAU and Supermon to address this problem. TAU performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very lowoverhead application monitoring as well as other benefits unavailable from using a transport such as NFS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic 2 Performance Prediction and Evaluation

Parallel algorithms used to be evaluated using some version of the PRAM model where actual execution platforms are abstracted as ideal parallel machines. On the other hand the performance of hardware is often given in terms of individual pick performances which can be useless for actual applications. The real challenge for performance predictions and evaluations of parallel systems is to combin...

متن کامل

TAUmon: Scalable Online Performance Data Analysis in TAU

In this paper, we present an update on the scalable online support for performance data analysis and monitoring in TAU. Extending on our prior work with TAUoverSupermon and TAUoverMRNet, we show how online analysis operations can also be supported directly and scalably using the parallel infrastructure provided by an MPI application instrumented with TAU. We also report on efforts to streamline...

متن کامل

A framework for scalable, parallel performance monitoring

Performance monitoring of HPC applications offers opportunities for adaptive optimization based on dynamic performance behavior, unavailable in purely post-mortem performance views. However, a parallel performance monitoring system must have low overhead and high efficiency to make these opportunities tangible. We describe a scalable parallel performance monitor called TAUoverMRNet (ToM), creat...

متن کامل

A Framework for Scalable, Parallel Performance Monitoring using TAU and MRNet

Performance monitoring of HPC applications offers opportunities for adaptive optimization based on dynamic performance behavior, unavailable in purely post-mortem performance views. However, a parallel performance monitoring system must have low overhead and high efficiency to make these opportunities tangible. We describe a scalable parallel performance monitor called TAUoverMRNet (ToM), creat...

متن کامل

Online Monitoring for Industrial Processes Quality Control Using Time Varying Parameter Model

A novel data-driven soft sensor is designed for online product quality prediction and control performance modification in industrial units. A combined approach of time variable parameter (TVP) model, dynamic auto regressive exogenous variable (DARX) algorithm, nonlinear correlation analysis and criterion-based elimination method is introduced in this work. The soft sensor performance validation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007